A Performance Study of a Large-scale Data Collection Problem

نویسندگان

Cheng-Fu Chou

Yung-Chun Wan

William C. Cheng

Leana Golubchik

Samir Khuller

چکیده

We consider the problem of moving a large amount of data from several source hosts to a destination host over a wide-area network, i.e., a large-scale data collection problem. This problem is important since improvements in data collection times are crucial to performance of many applications, such as wide-area uploads, high-performance computing, and data mining. Existing approaches to the large-scale data collection problem are (a) transferring data directly from the source hosts to the destination host, using IPprescribed routes (which we refer to as direct methods) or (b) using “best”-path type application-level re-routing techniques, which we refer to as noncoordinated methods. However, we believe that in the case of large-scale data collection applications, it is important to coordinate data transfers from multiple sources. More specifically, our coordinated method takes into consideration the transfer demands of all source hosts and then schedules all This work was supported in part by the NSF ITR CCR0113192 and the NSF Digital Government EIA0091474 grants. Dept. of Computer Science and Information Engineering, National Taiwan University. This work was partly done while the author was with the Department of Computer Science and UMIACS at the University of Maryland. Department of Computer Science and UMIACS, University of Maryland at College Park. TeleGIF, Marina del Rey, California. This work was partly done while the author was with the Department of Computer Science and UMIACS at the University of Maryland. Computer Science Department and IMSC and ISI, University of Southern California. This work was partly done while the author was with the Department of Computer Science and UMIACS at the University of Maryland. data transfers in parallel, using multiple paths existing between the source hosts and the destination host. All this is done at the application layer. In this paper, we present a performance and robustness study of the different data collection methods, namely the direct, the non-coordinated, and the coordinated methods. Our results show that coordinated methods can perform significantly better than non-coordinated and direct methods under various types of network congestion conditions. We also show that coordinated methods are more robust than non-coordinated methods under inaccuracies in network conditions information. Therefore, we believe that coordinated methods are a promising approach to large-scale data collection problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A TWO-STAGE DAMAGE DETECTION METHOD FOR LARGE-SCALE STRUCTURES BY KINETIC AND MODAL STRAIN ENERGIES USING HEURISTIC PARTICLE SWARM OPTIMIZATION

In this study, an approach for damage detection of large-scale structures is developed by employing kinetic and modal strain energies and also Heuristic Particle Swarm Optimization (HPSO) algorithm. Kinetic strain energy is employed to determine the location of structural damages. After determining the suspected damage locations, the severity of damages is obtained based on variations of modal ...

متن کامل

Solving Re-entrant No-wait Flexible Flowshop Scheduling Problem; Using the Bottleneck-based Heuristic and Genetic Algorithm

In this paper, we study the re-entrant no-wait flexible flowshop scheduling problem with makespan minimization objective and then consider two parallel machines for each stage. The main characteristic of a re-entrant environment is that at least one job is likely to visit certain stages more than once during the process. The no-wait property describes a situation in which every job has its own ...

متن کامل

Solving a robust capacitated arc routing problem using a hybrid simulated annealing algorithm: A waste collection application

The urban waste collection is one of the major municipal activities that involves large expenditures and difficult operational problems. Also, waste collection and disposal have high expenses such as investment cost (i.e. vehicles fleet) and high operational cost (i.e. fuel, maintenance). In fact, making slight improvements in this issue lead to a huge saving in municipal consumption. Some inci...

متن کامل

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...

متن کامل

A method to solve the problem of missing data, outlier data and noisy data in order to improve the performance of human and information interaction

Abstract Purpose: Errors in data collection and failure to pay attention to data that are noisy in the collection process for any reason cause problems in data-based analysis and, as a result, wrong decision-making. Therefore, solving the problem of missing or noisy data before processing and analysis is of vital importance in analytical systems. The purpose of this paper is to provide a metho...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

A Performance Study of a Large-scale Data Collection Problem

نویسندگان

چکیده

منابع مشابه

A TWO-STAGE DAMAGE DETECTION METHOD FOR LARGE-SCALE STRUCTURES BY KINETIC AND MODAL STRAIN ENERGIES USING HEURISTIC PARTICLE SWARM OPTIMIZATION

Solving Re-entrant No-wait Flexible Flowshop Scheduling Problem; Using the Bottleneck-based Heuristic and Genetic Algorithm

Solving a robust capacitated arc routing problem using a hybrid simulated annealing algorithm: A waste collection application

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

A method to solve the problem of missing data, outlier data and noisy data in order to improve the performance of human and information interaction

عنوان ژورنال:

اشتراک گذاری